-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crop layer for automatically aligning computations #1976
Conversation
This will be useful for keeping track of the coordinate transformations induced by, e.g., convolution and pooling layers.
This allows layers to do things that depend, e.g., on net topology.
Crop layer for automatically aligning computations * shelhamer/crop-layer: add CropLayer for cropping one blob to another using induced coordinates layers get a pointer back to their owning Net implement coord_map for all applicable layers add FilterMap for the coord mapping used by (de)conv and pooling layers add util/coords.hpp for coordinate mapping functions
Crop layer for automatically aligning computations * shelhamer/crop-layer: add CropLayer for cropping one blob to another using induced coordinates layers get a pointer back to their owning Net implement coord_map for all applicable layers add FilterMap for the coord mapping used by (de)conv and pooling layers add util/coords.hpp for coordinate mapping functions
Crop layer for automatically aligning computations
Crop layer for automatically aligning computations
Crop layer for automatically aligning computations
Crop layer for automatically aligning computations
Crop layer for automatically aligning computations
Crop layer for automatically aligning computations Conflicts: include/caffe/vision_layers.hpp
Crop layer for automatically aligning computations
This is a great thing. I have one suggestion about specifying the operation. The actual data of one of the bottom blobs is not needed; you only need its shape and the coordinate mapping so you can perform the crop. Instead of specifying the "crop like this" blob as a bottom blob, it could instead be referenced by name in another field in a new CropLayerParameter protobuf message. This way, we avoid introducing a split layer (and the additional data copying) that would occur with making it a bottom blob. This reduces GPU memory usage and could allow larger networks to use this. |
For those watching, this is due for an update with a less intrusive version (like the "another way" above) that takes advantage of net spec, after which I think it'll be ready for merge. @waldol1, yes, that's a good point about the extra allocation. Unfortunately you can't really specify a layer name in a parameter (without some extra mechanism), since that breaks the layer abstraction (a layer has no way to access other layers (without the backpointer to its net, which is present here but removed in the "less intrusive version" above) except through its top and bottom links). Let's think more about the right way to address that! |
@longjon: Does the less intrusive version already exist in some form? I'm willing to pitch in either way to make this ready for merge. |
Crop layer for automatically aligning computations # Conflicts: # include/caffe/common_layers.hpp # include/caffe/layer.hpp # include/caffe/neuron_layers.hpp # include/caffe/vision_layers.hpp # src/caffe/net.cpp
@longjon Do you know when the netspec version of this PR can be committed? In lieu of this I did a naive rebased this PR here. Which sefaults for some reason, 😬, probably due to the now different sharing layers from root. I didn't look at this too closely, but for the less intrusive version I guess the idea is for the crop layer to have access to the other layers over the layer sharing through net functionality ( whats the right name for this?). Then the only hurdle would be that the variables that determine the transformation are stored in different formats ( kernel_shape_ as Blob in conv and deconv, vs kernel_h_, kernel_w_ in pool ). This would mean (a.) either storing all transformation in the same format, or (b.) special code in the crop layer to know where it should look. --This is pure conjecture. BR, Max |
Sorry for the wait folks; I hope to have an update on this next week. You may want to take a look at my most recently rebased version (still fairly out-of-date, but less than this PR) at https://github.com/longjon/caffe/tree/future. @BlGene: not really, the new way I prefer to do this is:
This moves the magic from inside a layer to outside the net, preserving the layers of abstraction without hindering future functionality; as a bonus it makes the crop values discoverable to the user, and can be adapted for other features that rely on computing the coordinate maps. More details to come! |
Yes, that would be even less invasive. I updated my rebase to so that the crop layer should now work this way. The layer only works for 4D blobs, one could in theory extend it. The next step would be to write a demo python script that calculates appropriate crop offset parameters for the FCN net for example. BR, Max |
Replaced by N-D crop in #3570. |
master
edition of #1639 -- thanks to a rebase by @philkr. After #1974 and #1975.Existing layers shift and warp coordinate space: translation by padding (or lack thereof), contraction by strided convolution or pooling, and expansion by strided deconvolution (#1615). Often one wants to align two blobs, e.g., to establish a correspondence between input and output, or to fuse two different paths of computation. Counting conv/deconv strides to ensure that blob coordinates have the same scale is generally straightforward. Computing the offset between two blobs that results from intermediate padding and kernel sizes is trickier.
This layer takes two bottom blobs and produces one top, which is a copy of the first bottom cropped to the size of the second so that coordinates exactly correspond, i.e., it makes sense to fuse or compare the top blob with the second bottom, regardless of whatever padding or other shenanigans took place between their computation.
This is done by computing the coordinate mapping between the two bottom blobs, as provided by #1637 and made accessible by #1638. If that mapping is a simple translation, and has the right sign to allow the first blob to be "cropped to" the second, the layer simply performs the copy. If the mapping is not an integer translation, or the translation has the wrong sign, an error is thrown, and the net may be rearranged to allow sensible fusion.
The implementation of
LayerSetUp
amounts to some simple graph traversal to find the path connecting the two bottoms. CurrentlyNet
does not provide great facilities for traversing the layer graph, so it's a bit cumbersome; maybe this can be improved in the future.There is a bit of engineering involved in these three PRs, but the result is pretty convenient: what was before a tricky offline calculation becomes a trivial layer specification.
Another way to implement this, without #1974, would be to remove the graph traversal from
CropLayer
, giving it a simple parameter instead, and provide some other mechanism for automatically filling in that parameter.Currently CPU and GPU (trivially) are provided, but tests and documentation are not.